visual abstraction
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Poland (0.04)
- (2 more...)
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce $\texttt{SEVA}$, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm (Vinker et al., 2022) capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Poland (0.04)
- (2 more...)
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce \texttt{SEVA}, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task.
What Makes a Maze Look Like a Maze?
Hsu, Joy, Mao, Jiayuan, Tenenbaum, Joshua B., Goodman, Noah D., Wu, Jiajun
A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them. While off-the-shelf vision-language models excel at making literal interpretations of images (e.g., recognizing object categories such as tree branches), they still struggle to make sense of such visual abstractions (e.g., how an arrangement of tree branches may form the walls of a maze). To address this challenge, we introduce Deep Schema Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning. At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols. DSG uses large language models to extract schemas, then hierarchically grounds concrete to abstract components of the schema onto images with vision-language models. The grounded schema is used to augment visual abstraction understanding. We systematically evaluate DSG and different methods in reasoning on our new Visual Abstractions Dataset, which consists of diverse, real-world images of abstract concepts and corresponding question-answer pairs labeled by humans. We show that DSG significantly improves the abstract visual reasoning performance of vision-language models, and is a step toward human-aligned understanding of visual abstractions.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
From Language Games to Drawing Games
Fernando, Chrisantha, Zenkova, Daria, Nikolov, Stanislav, Osindero, Simon
Sadly, no other animal represents the world with language or drawing. Early examples of drawing date back to 60,000 years ago [19], though red pigments for mark making are already found 200,000 years ago in the middle stone age [18]. "The first man to make a mammoth appear on the wall of a cave was, I am confident, amazed by what he had done" writes Gibson, because they had discovered that by means of lines they could delineate something [10] p263. What allowed humans to learn to create (visual) abstractions, e.g., the Western child's human stick figure, the Australian aboriginal top-down projections of people seated around a fireplace, the Egyptian formalism for representing things in orthographic projection with multiple station-points, and with social dominance relations transformed into size differences, or the 16th Century Japanese affine projections that have a birdseye viewpoint? Making our own abstraction creating (drawing) machines is one way to find out the answer. A wonderful start was made by Harold Cohen's abstract drawing programs "Aaron" [6]. Aaron and Harold produced beautiful and interesting abstracted drawings that looked as if they had been made by a human alone. But it had no learning, was not conditioned on looking at the world, and was an entirely hand designed production system (a complex set of hierarchical rules for drawing).
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Southern Africa (0.04)
Label Efficient Visual Abstractions for Autonomous Driving
Behl, Aseem, Chitta, Kashyap, Prakash, Aditya, Ohn-Bar, Eshed, Geiger, Andreas
It is well known that semantic segmentation can be used as an effective intermediate representation for learning driving policies. However, the task of street scene semantic segmentation requires expensive annotations. Furthermore, segmentation algorithms are often trained irrespective of the actual driving task, using auxiliary image-space loss functions which are not guaranteed to maximize driving metrics such as safety or distance traveled per intervention. In this work, we seek to quantify the impact of reducing segmentation annotation costs on learned behavior cloning agents. We analyze several segmentation-based intermediate representations. We use these visual abstractions to systematically study the trade-off between annotation efficiency and driving performance, i.e., the types of classes labeled, the number of image samples used to learn the visual abstraction model, and their granularity (e.g., object masks vs. 2D bounding boxes). Our analysis uncovers several practical insights into how segmentation-based visual abstractions can be exploited in a more label efficient manner. Surprisingly, we find that state-of-the-art driving performance can be achieved with orders of magnitude reduction in annotation cost. Beyond label efficiency, we find several additional training benefits when leveraging visual abstractions, such as a significant reduction in the variance of the learned policy when compared to state-of-the-art end-to-end driving models.
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.84)
Shared Visual Abstractions
This paper presents abstract art created by neural networks and broadly recognizable across various computer vision systems. The existence of abstract forms that trigger specific labels independent of neural architecture or training set suggests convolutional neural networks build shared visual representations for the categories they understand. Computer vision classifiers encountering these drawings often respond with strong responses for specific labels - in extreme cases stronger than all examples from the validation set. By surveying human subjects we confirm that these abstract artworks are also broadly recognizable by people, suggesting visual representations triggered by these drawings are shared across human and computer vision systems.
Pragmatic inference and visual abstraction enable contextual flexibility during visual communication
Fan, Judith, Hawkins, Robert, Wu, Mike, Goodman, Noah
Visual modes of communication are ubiquitous in modern life. Here we investigate drawing, the most basic form of visual communication. Communicative drawing poses a core challenge for theories of how vision and social cognition interact, requiring a detailed understanding of how sensory information and social context jointly determine what information is relevant to communicate. Participants (N=192) were paired in an online environment to play a sketching-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher's goal was to draw one of these objects - the target - so that the viewer could select it from the array. There were two types of trials: close, where objects belonged to the same basic-level category, and far, where objects belonged to different categories. We found that people exploited information in common ground with their partner to efficiently communicate about the target: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core competencies: (1) visual abstraction, the capacity to perceive the correspondence between an object and a drawing of it; and (2) pragmatic inference, the ability to infer what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both competencies, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants, providing an algorithmically explicit theory of how perception and social cognition jointly support contextual flexibility in visual communication.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)